RNA-Seq reveals that overexpression of TcUBP1 switches the gene expression pattern toward that of the infective form of Trypanosoma cruzi

Trypanosomes regulate gene expression mainly by using posttranscriptional mechanisms. Key factors responsible for carrying out this regulation are RNA-binding proteins, affecting subcellular localization, translation, and/or transcript stability. Trypanosoma cruzi U-rich RNA-binding protein 1 (TcUBP1) is a small protein that modulates the expression of several surface glycoproteins of the trypomastigote infective stage of the parasite. Its mRNA targets are known, but the impact of its overexpression at the transcriptome level in the insect-dwelling epimastigote cells has not yet been investigated. Thus, in the present study, by using a tetracycline-inducible system, we generated a population of TcUBP1-overexpressing parasites and analyzed its effect by RNA-Seq methodology. This allowed us to identify 793 up- and 371 downregulated genes with respect to the wildtype control sample. Among the upregulated genes, it was possible to identify members coding for the TcS superfamily, MASP, MUCI/II, and protein kinases, whereas among the downregulated transcripts, we found mainly genes coding for ribosomal, mitochondrial, and synthetic pathway proteins. RNA-Seq comparison with two previously published datasets revealed that the expression profile of this TcUBP1-overexpressing replicative epimastigote form resembles the transition to the infective metacyclic trypomastigote stage. We identified novel cis-regulatory elements in the 3′-untranslated region of the affected transcripts and confirmed that UBP1m, a signature TcUBP1 binding element previously characterized in our laboratory, is enriched in the list of stabilized genes. We can conclude that the overall effect of TcUBP1 overexpression on the epimastigote transcriptome is mainly the stabilization of mRNAs coding for proteins that are important for parasite infection.

Karina B. Sabalette 1,2 , José R. Sotelo-Silveira 3,4 , Pablo Smircich 3,4 , and Javier G. De Gaudenzi 1,2, * From the 1  Trypanosomes regulate gene expression mainly by using posttranscriptional mechanisms. Key factors responsible for carrying out this regulation are RNA-binding proteins, affecting subcellular localization, translation, and/or transcript stability. Trypanosoma cruzi U-rich RNA-binding protein 1 (TcUBP1) is a small protein that modulates the expression of several surface glycoproteins of the trypomastigote infective stage of the parasite. Its mRNA targets are known, but the impact of its overexpression at the transcriptome level in the insect-dwelling epimastigote cells has not yet been investigated. Thus, in the present study, by using a tetracycline-inducible system, we generated a population of TcUBP1-overexpressing parasites and analyzed its effect by RNA-Seq methodology. This allowed us to identify 793 up-and 371 downregulated genes with respect to the wildtype control sample. Among the upregulated genes, it was possible to identify members coding for the TcS superfamily, MASP, MUCI/II, and protein kinases, whereas among the downregulated transcripts, we found mainly genes coding for ribosomal, mitochondrial, and synthetic pathway proteins. RNA-Seq comparison with two previously published datasets revealed that the expression profile of this TcUBP1overexpressing replicative epimastigote form resembles the transition to the infective metacyclic trypomastigote stage. We identified novel cis-regulatory elements in the 3 0 -untranslated region of the affected transcripts and confirmed that UBP1m, a signature TcUBP1 binding element previously characterized in our laboratory, is enriched in the list of stabilized genes. We can conclude that the overall effect of TcUBP1 overexpression on the epimastigote transcriptome is mainly the stabilization of mRNAs coding for proteins that are important for parasite infection.
Trypanosomes are interesting models to study unusual mechanisms of gene expression regulation. Unlike most eukaryotes, trypanosomatids lack control at the level of transcription initiation for each individual gene. In contrast, transcription by RNA polymerase II is polycistronic and transcript synthesis initiates from a few sites on each chromosome (1)(2)(3). Individual mature mRNAs are generated by 5 0 trans-splicing (4) and 3 0 polyadenylation (5). Owing to these biological constraints, these microorganisms control protein levels mainly by posttranscriptional events. The fate of mRNAs in the cell depends on the set of RNA-binding proteins (RBPs) associated with them, and these molecular interactions can also be organized into larger mRNP complexes forming stress granules or P-bodies (6). Among the critical aspects of mRNA metabolism are 5 0 and 3 0 -end processing (7), nuclear export (8), mRNA stability (9), and translation (10)(11)(12). Over the years we have contributed, in part, to a better understanding of these mechanisms in Trypanosoma cruzi, an early branching eukaryotic unicellular parasite causing Chagas disease (8,(13)(14)(15)(16)(17)(18)(19). Particularly, the first RNA-Seq transcriptome and translatome for this parasite showed that translation regulation plays a critical role in governing gene expression profiles during T. cruzi differentiation (10). We and other authors have reported some of the molecular mechanisms that might operate to explain this regulation (8,16,20). At all these regulatory points, RBPs can intervene as crucial trans-acting factors and mediate parasite differentiation in both T. cruzi and Trypanosoma brucei (21,22).
The present study focused on T. cruzi U-rich RBP 1 (TcUBP1), one of the first trypanosome RNA-recognition motif (RRM)containing proteins described. TcUBP1 is an exclusive trypanosomal RBP having a single RRM (23) with the characteristic β 1 α 1 β 2 β 3 α 2 β 4 fold. It is expressed in all stages of the parasite life cycle and regulates the abundance of a large number of genes containing U-rich elements (19,24). Some of the ribonucleoprotein complexes containing TcUBP1 are developmentally regulated, as determined by profile expression of target transcripts and RT-PCR analysis of coimmunoprecipitated RNAs (17,18).
The ability of T. cruzi to survive in the mammalian host is in part due to the expression of a plethora of surface proteins and signaling genes, which include the trans-sialidase and trans-sialidase like (TcS) superfamily, mucins, and mucin-associated surface proteins, among others (25,26). In previous studies on partners of the TcUBP1-mRNP complex by in vivo RBP immunoprecipitation, we found several transcripts encoding TcS proteins (24). Interestingly, TcUBP1, in synchrony with nutritional deficiency, is known to mediate differentiation of T. cruzi epimastigotes into infective metacyclic trypomastigotes (27), by coordinating a timely developmental program (28). TcS members are surface glycoprotein-coding genes expressed only in trypomastigote forms, but the in vivo interaction of TcUBP1-TcS RNAs occurs in both replicative and infective cells. In this regard, ectopic overexpression of TcUBP1 in replicative forms resulted in >10-fold upregulated expression of numerous TcS mRNAs and changes in their subcellular localization from the posterior zone to the perinuclear region of the cytoplasm, as is typically observed in the infective stages. This fact has led to the hypothesis that TcUBP1 can promote a switch toward profile expression of infective trypomastigotes in T. cruzi by increasing the mRNA levels and translation rates of an RNA regulon for trypomastigote surface glycoproteins during parasite development. The posttranscriptional paradigm of RNA regulons was first posited by Keene and Lager almost 2 decades ago (29)(30)(31) and suggests that, by recognizing structural and/or sequence RNA elements, cells can coregulate subsets of transcripts with a shared physiological function.
For an RBP of interest, identifying the in vivo binding sites is a critical step toward understanding its function. However, the complete influence of TcUBP1 overexpression in the determination of the parasite transcriptome is not known and its precise binding sites have not been described. Thus, the aim of the present study was to perform an RNA-Seq analysis on epimastigote samples overexpressing UBP1.

Identification of differentially expressed genes after TcUBP1 overexpression
To gain comprehensive insights into the regulatory role of TcUBP1, we analyzed the impact of TcUBP1 overexpression on the T. cruzi CL-Brener transcriptome. For this, TcUBP1-GFP-induced epimastigotes (UBP1-OE) or control wildtype samples (WT) were subjected to RNA-Seq analysis (see Experimental procedures). After assembly and annotation, we identified a total of 9039 genes (Fig. 1A). The expression levels of each gene of the UBP1-OE and WT populations were calculated by mapping clean read sets onto the reference transcriptome of the CL Brener Esmeraldo-like strain (Tri-TrypDB-59_TcruziCLBrenerEsmeraldo-like_Genome.fasta). The data from different libraries were normalized using the normalization method in the software package DESeq2 (32).
The distribution pattern of transcript expression in UBP1induced versus WT parasite populations was analyzed in detail. Results showed that 64% of the total genes (5,741) were significantly expressed, with false discovery rate (FDR)adjusted p values lower than 0.05, and that 13% of the genes (1,164) were differentially expressed (|log2 fold change|>1, FDR-adjusted p value < 0.05; File S1). The percentages of upand downregulated genes in UBP1 tetracycline-induced parasites were 8.8% and 4.1%, respectively. In addition, the expression patterns for all genes (A), for the significantly expressed genes (B), and for the most correlated genes (i.e., genes that were found overexpressed in one sample and underexpressed in the other and vice versa) (C), in control and OE parasites are shown in Figure 1. The light blue, white, and orange colors indicate less expressed, medium-level expressed, and highly expressed genes, respectively (Fig. 1A). By analyzing the complete expression profile of the up-and downregulated genes, we concluded that the global effect of UBP1-OE is mostly to stabilize the transcriptome, since nearly 800 genes were 2-fold upregulated and less than half of the genes were 2-fold downregulated (|log2 fold change| > 1, FDR-adjusted p-value < 0.05) (Fig. 1B).
A Venn diagram was generated to show a representation of the differentially expressed genes mentioned above ( Fig. 2A). This included 793 upregulated and 371 downregulated genes in TcUBP1-induced samples (File S1). The number of upregulated genes in UBP1-OE parasites was two times higher than that of downregulated genes. In addition, results showed that 33 genes, most of which coded for hypothetical proteins, were expressed exclusively in UBP1-OE samples, and that 11 genes, mostly related to the chromosome organization process, were expressed exclusively in WT parasites (File S2). As expected, TcUBP1 (TcCLB.507093.220) was 71 times higher in the OE samples (FDR-adjusted p value = 1.1E-58), showing the largest difference between the OE and WT samples. This value reflected the expected overexpression of TcUBP1 as a consequence of the pTcINDEX induction with tetracycline. A volcano plot of gene expression in UBP1-induced and WT parasites is shown in Figure 2B, where significantly expressed genes are separated from the nonsignificantly expressed genes by different color codes. The 20 most statistically significant up-and downregulated genes are toward the top, labeled with gene symbols together with TcUBP1. Also, the Top 10 list with the most differentially over-or underexpressed genes (based on fold change values) is shown in Table 1 and also depicted in Figure 2C.
TcUBP1 overexpression leads to upregulation of cell-surface trypomastigote glycoproteins and downregulation of ribosomal and mitochondrial proteins Gene ontology (GO) analyses using TriTrypDB performed on genes over-and underexpressed in UBP1-OE parasites showed a distribution of 18 and 107 GO overrepresented terms, respectively. The enrichment chart was plotted showing each significant GO term and the percentage of genes present in our differentially expressed genes compared with the background for each category (Fig. 3). The complete distribution is provided in File S3. A plot for all the three GO domains, biological process, molecular function, and cellular process, is presented in Figure 3A (upregulation) and Figure 3B (downregulation). The GO analysis of differentially expressed genes with significant differences revealed that they are involved in critical biological processes and cellular components, such as pathogenesis, cell adhesion, and protein phosphorylation (in the case of upregulated genes), and in ribosomes, GTPase activity, and mitochondria (in the case of downregulated genes). DAVID (Database for Annotation, Visualization, and Integrated Discovery) enrichment analysis classified all the enriched protein domains into three categories: InterPro, Pfam, and Smart. DAVID annotation products were recovered using the online GeneID Conversion tool. Of 793 genes in the upregulated group, 791 were accepted by DAVID for the analysis and assigned to 9 clusters, whereas all the 371 genes in the downregulated group were accepted and assigned to 8 clusters (File S4). Based on FDR-adjusted p values, among the top enriched domains for the upregulated group, the trypanosome sialidase, protein kinase, and RNA-binding domains had the largest number of genes ( Table 2). For the downregulated genes, the most abundant classes were found to be the mitochondrial substrate/solute carrier, 40S ribosomal protein, and small GTP-binding protein domains. The results obtained using the graphical tool of the ShinyGO web application are shown in Fig. S1.
We then investigated transcript expression by carrying out a comparative analysis of several functional gene groups. Based on the data presented above, we manually classified the majority of sequences obtained from UBP1-OE parasites into 16  transcripts among these groups was analyzed using violin plots showing expression values (log2 fold change OE/WT) (Fig. 4).
Results confirmed that among the most abundant transcripts in the UBP1-OE transcriptome are those coding for cell-surface glycoproteins, protein kinases/phosphatases, and RNA-binding proteins (Fig. 4, A, E, F and P) and that among the least abundant transcripts are those coding for ribosomal proteins, mitochondrial transcripts, and some Dispersed Gene Family hits (Fig. 4, B, N and O), with the cluster of ribosomal proteins having the highest number of downregulated hits. The dispersed gene family is large, with many of its members predicted to have transmembrane domains and reported to be more abundant in the amastigote stage than in trypomastigotes and epimastigotes (33).
Clearly, the most abundant cluster among the upregulated genes in UBP1-OE samples was that of surface membraneassociated proteins. Within this group, we identified 171 trans-sialidase/trans-sialidase-like genes, 108 mucin-associated surface proteins, and 88 mucins (  Fig. 2B; p value < 1E-80 and log2 fold change > 1.9). Notably, we also observed upregulation of three transsialidase-like mRNAs that had been previously reported to be upregulated in UBP1-OE parasites (28) (Fig. S2). These three transcripts harbor a known structural TcUBP1 RNA-binding element in their 3 0 -UTRs, previously described in our laboratory and termed UBP1m (24). In addition, in the upregulated list, we observed 33 protein kinases and 15 protein phosphatases (see Discussion).
The expression profile of UBP1-OE epimastigotes resembles that of the transcriptome of trypomastigote infective stages We then performed a comparative transcriptomic analysis using the RNA-Seq data obtained from Smircich et al. (35) and Li et al. (36) to compare the expression profiles of TcUBP1overexpressing parasites with those of the four T. cruzi stages. In order to compare between sets of RNA-Seq data from different experiments, we used an ad hoc pipeline to map the reads from these laboratories with the reference Esmeraldo-like CL Brener genome and then compared the fold change of the expression values. We calculated the percentage of regulated transcripts in UBP1-OE parasites among the most up-and downregulated genes in a pairwise comparison between the metacyclic trypomastigote (MT), cell-derived trypomastigote (Trypo), epimastigotes (Epi), and amastigotes (Ama) stages. We clustered the different fold change values for each pairwise comparison into groups of up-and downregulated genes with >1.5-, >2-, >2.83-, >4-, or >8-fold change differences between two stages (UBP1-OE versus Epi, MT versus Epi, Trypo versus Epi, Trypo versus Ama, and Epi versus Ama).
When analyzing the whole range (<4-to >8-fold change), we found that the UBP1-OE transcriptome showed highest similarity with the Trypo/Epi and MT/Epi datasets (genes overrepresented in Trypo or MT with respect to Epi). The expression profile of UBP1-OE coincided 43.0% with the MT/ Epi and 43.9% with the Trypo/Epi ratios. Particularly, for the upregulated genes (>1.5-to >4-fold change), the Trypo/Epi comparison showed >60% similarity to UBP1-OE. The third dataset that was more similar to UBP1-OE was Trypo/Ama, which also displayed average percentage values of 58% in the upregulated genes. No significant coverage was found for any of the up-or downregulated transcripts in the Epi/Ama comparison. The similarity between datasets indicates that the transcriptome of UBP1-induced parasites has an expression profile that resembles that of the trypomastigote and metacyclic trypomastigote infective forms (ANOVA with post hoc Tukey test, p value = 0.00509). This can be visualized by different statistically significant colored clusters in the heatmap depicted in Figure 6A (Tukey multiple comparisons: MT/ Epi -Epi/Ama, p value = 0.0121; Trypo/Ama -Epi/Ama, p value = 0.0096; and Trypo/Epi -Epi/Ama, p value = 0.0417).
Next, we obtained fold change values for 1737 genes from the RNA-Seq experiments (File S5). This RNA-Seq expression table was used to perform a principal component analysis (PCA) to compare the dispersion of the different datasets. The horizontal axis (PC1) describes 64.2% of the variability, and, considering this component, the sample UBP1-OE/Epi is distinctly located closer to the MT and Trypo experiments than to Epi/Ama. Thus, similar to that shown in Figure 6A, this analysis showed that the expression profile of the UBP1-OE population is more similar to that of the infective stages (MT/Epi, Trypo/Epi, and Trypo/Ama) than to that of the replicative stage (Epi/Ama) (Fig. 6B).
These expression values were then used to calculate the Pearson correlation of all the samples, to which we also added the expression values of Ama/Epi, Ama/Trypo, Epi/Trypo, and Epi/MT. The column corresponding to UBP1-OE/Epi is boxed. Again, the highest correlation was observed with the Trypo/Epi (0.5576), Trypo/Ama (0.4972), and MT/Epi (0.4610) datasets (Fig. 6C). No significant correlation was found between UBP1 and any of the remaining RNA-Seq datasets. The analysis of shared genes, PCA, and correlation between the different experiments analyzed showed that UBP1-overexpressing parasites have an expression profile that resembles that of infective forms of T. cruzi.
Identification of cis-elements in the 3 0 -UTR of genes regulated by TcUBP1 overexpression We next searched this transcriptome for the occurrence of a known structural UBP1 RNA-binding element, UBP1m, previously described in our laboratory (24), and also de novo sequence motifs. The most abundant mRNA targets previously identified for TcUBP1 encode for energy metabolism and cellsurface membrane glycoproteins. As mentioned above, the transcriptome analysis showed that, in UBP1-OE cells, these groups are either over-or underrepresented: cell-surface trypomastigote glycoproteins are upregulated and mitochondrial transcripts coding for proteins related to energy metabolism are downregulated. With this result in mind, we decided to analyze how many of the mRNAs impacted by TcUBP1 overexpression could be direct interacting targets. For this purpose, we used the presence of the characteristic binding element UBP1m (24) as a target criterion. We then evaluated the motif coverage of UBP1m in the up-and downregulated genes in the UBP1-OE sample. We looked at all transcripts that were expressed with fold changes ranging from < -8X to > 8×.
The downregulated genes showed no significant differences in the presence of the UBP1m motif. Similarly, we observed Transcriptome of T. cruzi UBP1-overexpressing parasites that not regulated genes, found to be between the −1.5-fold and 1.5-fold categories (log2 fold change = −0.58 to log2 fold change = 0.58), presented a 5% UBP1m coverage (147 out of 2829). However, as the fold change increased in the upregulated genes, the abundance of the UBP1m element also increased (see Fig. 7A). Genes with > 5-fold upregulation (log2 fold change > 2.32) showed an 8% (4 of 46) presence of UBP1m, genes upregulated >6-fold showed 15% motif coverage (4 of 27), and genes with >8-fold showed the highest motif coverage (18%; 2 of 11). This is consistent with the idea that the UBP1m motif might have a stabilizing effect on the mRNAs containing it. This all makes sense given that the UBP1m was originally detected in UBP1-immunoprecipitated mRNAs. Therefore, mRNAs stabilized by UBP1 (and containing the motif) were easily purified, whereas those destabilized and possibly containing other elements did not precipitate. We concluded that the UBP1 binding motif was enriched in the group of upregulated genes (with log2 fold change values >2.32) compared with all the remaining groups (ANOVA analysis, post hoc Tukey, p value = 1.47E-05).
After that, by using a cutoff value of 4-fold change, we detected 89 up-and 14 downregulated genes in UBP1-OE parasites. For each group, a length of 350 nt downstream from the coding sequences was downloaded using TcruziDB to obtain sequences resembling the 3 0 -UTR, in agreement with data previously reported for trypanosomes (37). The upregulated list was partitioned into two subgroups: set 1 (composed of 45 genes) and set 2 (composed of 44 genes). Thus, we first searched motifs in set 1 and then in set 2. To this end, we ran the motif prediction tool TRAWLER (http://trawler.monash. edu.ar) with default parameters, using set 1 as input sequences and the downregulated group as the background list. Results showed three candidate motifs: family_1 (5 0 -TVTMTATATATATATATABR-3 0 , Z-score: 64.78), family_2 (5 0 -NNTTTRCTTTB-3 0 , Z-score: 79.50) and family_3 (5 0 -CTCYTSCY-3 0 , Z-score 61.92). The WebLogo representation indicated that the family_1 motif is rich in AT content, whereas the family_3 motif has a CT-rich sequence composition (Fig. 7B). When the relative frequencies of these motifs within the 3 0 -UTR of both the input (set 1) and the control (set 2) datasets were examined using the FIMO software, an enrichment of all elements was observed: family_1 showed 7.47 total hits/kbps in set 1 and 5.24 total hits/kbps in set 2; family_2 showed 3.86 total hits/kbps in set 1 and 3.11 total hits/kbps in set 2; and family_3 showed 1.64 total hits/kbps in set 1 and 1.03 total hits/kbps in set 2.
We next generated a homemade bioinformatic pipeline to predict the RNA binding of motifs to sequence proteins based on integrated published data sources. 1 For this purpose, we used the sequence elements as a query to search for proteins with the ability to bind them according to Tomtom motif analysis (https://meme-suite.org/meme/tools/tomtom, MEME Suite). Thus, heterologous interacting proteins were predicted as binders by searching against the RNAcompete database composed of 244 eukaryotic RBPs, with a p value < 0.004 (38,39). Then, highly similar T. cruzi proteins were identified as binders of these motifs by using the previous RNAcompete hits as queries in BLASTP searches against a T. cruzi RBP database composed of 285 sequences with RRM (23), zinc finger, PUF, Alba, KH, and PIWI domains (22) (blastp E_value < 1E-08 and subject coverage ≥ 50%, File S6). After this step, an RNA-protein interaction prediction software that uses only primary sequence information was systematically run on all previously obtained candidates to select those reliable interactions that have binding probabilities > 0.5, by using an SVM classifier (40). The results are shown in Figure 7B, with five putative RBP targets for family_1 (UBP1, UBP2, RBP5A/B, PABP1, and DRBD11B), three for family_2 (UBP1, UBP2, and RBP3), and only one for familiy_3 (DRBD3A). Of note, TcUBP1 was predicted to bind both family_1 and fam-ily_2 RNA motifs. Not surprisingly, the polypyrimidine tractbinding protein DRBD3A/PTB1 (TcCLB.506649.80) was predicted to bind the C/T-rich family_3 sequence. Moreover, we validated our predictions by using molecular docking experiments. To this, we used the TcUBP1 protein structure predicted by the AlphaFold database (Q4E1N5.pdb) and obtained the 3D structures of the RNA motifs with the 3DRNA software (https://bio.tools/3dRNA). We then ran the HDOCK docking server (41) and checked the RNA-protein interactions using the transcript sequences UAUAUAUAUAUAU AUAUAUA (as family_1 RNA ligand) and UUUGCUUUU (as family_2 RNA ligand). As positive controls, we used two experimental sequences reported to be binders of TcUBP1: UBP1m (5 0 -UGGCGCAUCCAUGCCUGGAUGCGCCG-3 0 ) (24) and UBP1m28 (5 0 -UUUUGGAGGAAGUUUUUUUU GGGG-3'). 2 In all the cases, we obtained HDOCK confidence scores >0.75, suggesting that these molecular interactions occur (Table 3). Taken together, these results suggest that these two family_1 and family_2 motifs, identified in the 3 0 -UTRs of the upregulated transcripts, might be involved in the interaction with TcUBP1.

Discussion
Trypanosomes harbor epigenetic modifications that change between their life cycle stages (42). Nonetheless, it is broadly accepted that transcription by RNA polymerase II in these pathogens deviates from the standard eukaryotic paradigm. In T. cruzi, there is no dedicated promoter for each gene, resulting in polycistronic transcription, and thus gene expression regulation depends heavily on large posttranscriptional networks (43). In the present work, we used an in vitro system based on the inducible expression of a GFPtagged UBP1 to monitor transcriptome changes during the differentiation of T. cruzi from noninfectious epimastigotes to infectious metacyclic trypomastigotes. In addition, we performed the bioinformatic analysis of two RNA-Seq samples, with three biological replicates each (Fig. S3), highlighting the differential transcript abundance and providing a data source to understand how this parasite becomes infectious.
Several lines of evidence support the role of certain RBPs as key regulators of trypanosome differentiation (21,22,(44)(45)(46)(47)(48)(49)(50)(51)(52). In a previous work, we showed that TcUBP1 binds to structural binding elements highly enriched in transcripts coding for surface cell virulence factors associated with the metacyclic trypomastigote developmental stage (24). In epimastigotes, translation of these transcripts is diminished and thus localized in the posterior zone of the cell until a stimulus such as the ectopic expression of TcUBP1-GFP triggers the metacyclogenesis program, upregulating and mobilizing these trypomastigote stage-specific mRNAs to polysomes (28).
In the present study, when analyzing exclusive transcripts in a given experimental condition, we found that the mRNAs expressed only in WT parasites are mostly related to chromosome organization, while those exclusively expressed in TcUBP1-GFP parasites code for mitochondrial RNA processing (File S2). Our results also evidenced that 3035 of 9039 genes (34%) showed significant differences in the mRNA steady-state levels in TcUBP1-OE parasites compared with WT parasites (|log2 fold change| > 0.58, FDR 0.05). It can be easily noticed that a high number of genes coding for trypomastigote cell-surface glycoproteins are stabilized in the transcriptome of the UBP1-transgenic epimastigotes (Fig. 5). The transcriptome difference between UBP1-OE versus Epi-WT is similar to the one observed between the quiescent infective metacyclic trypomastigote MT versus Epi (Fig. 6).
In agreement with the data described for the closely related parasite T. brucei (53), in the present study, we obtained a profile expression resembling that of the quiescent infectious trypomastigote parasites by overexpressing a single RRM protein, UBP1, in noninfectious epimastigotes. This conclusion is based upon two results. On the one hand, we focused on the upregulation of RNA abundances of numerous cellsurface trypomastigote glycoproteins, including members of the TcS superfamily (Figs. 4A and 5). This UBP1-OE transcriptome confirms our data on the glycoprotein RNA regulon of TcUBP1-expressing parasites (28). On the other hand, we noticed that the genes coding for ribosomal proteins were downregulated in the UBP1-OE parasites (Figs. 4B and 5). This decrease in the number of ribosomal protein-coding mRNAs is consistent with the translational repression previously reported for metacyclic trypomastigotes (10,54). Thus, distinctive gene expression hallmarks of the trypomastigote stage (55) are detected in the transcriptome of UBP1-overexpressing epimastigotes. These results further support posttranscriptional control as a critical regulatory mechanism required for parasite differentiation.
It has been reported that translation is strongly regulated during the T. cruzi cell cycle, causing variation in specific protein levels (56). In humans, multifunctional RBPs can regulate more than a single aspect of RNA metabolism. Schneider-Lunitz and coworkers have identified dozens of RBPs that influence mRNA abundance and translation efficiency of their targets (57). TcUBP1 could be also acting with this dual functionality. Previous results of our laboratory have demonstrated that an endogenous TcUBP1 fraction is associated with polysomes (58), and other researchers have also found TcUBP1 by means of a polysome proteomics approach (59). Regarding this, and as a consequence of TcUBP1 overexpression in epimastigotes, we also observed a change in the subcellular localization of cell-surface trypomastigote glycoprotein-coding transcripts, resembling the typical distribution of the metacyclic trypomastigote infective stage (28). Moreover, in our previous work, we also detected that trypomastigotes derived from TcUBP1 transgenic epimastigotes have an increased capacity for infection, an effect that has already been seen to be associated with increased protein expression of surface glycoproteins (55,60). All these observations could also suggest a possible regulatory role of TcUBP1 in the translation rate of trypomastigote-specific mRNAs. To investigate the degree of protein synthesis regulation, future proteomics and ribosome profiling researches should be performed.    According to our present results, after UBP1 overexpression, more protein kinases than protein phosphatases are affected. In T. brucei, the MAP kinase MAPKL1 (Tb927.10.10870) regulates proteins involved in mRNA metabolism (61), whereas, in UBP1-OE parasites, six CMGC family protein kinases with sequence similarities to MAPKL1 are upregulated. Thus, the transcripts of these protein kinases could be part of the downstream cascade involved in the phosphorylation network of T. cruzi. In contrast, the transcript levels of the TcAMPKs involved in autophagy and parasite nutrient sensing (62) do not seem to be regulated by TcUBP1.

TTTTTTTT T T A
Gene regulatory networks provide key strategies to identify RNA regulons and candidate RBPs for functional studies and/ or molecular targets for disease control (63)(64)(65)(66). The short sequence elements identified in this work could be signature marks for clusters of differentially upregulated genes (Fig. 7). In a preliminary computational work, we described a community of an RNA-protein interaction network composed of 26 T. cruzi RRM proteins (23) and 5 potential 3 0 UTR regulatory motifs (Table 4). 1 Notably, among these proteins, we found UBP1, UBP2, RBP5A/B, and TcPABP1, which are five of the seven different trans-factors identified for family_1 and family_2 cis-regulatory sequences. Regarding the mRNA expression levels of RBPs in TcUBP1-GFP-expressing parasites, we observed that 15 genes were 3-fold upregulated and that 5 genes were 2-fold downregulated (see File S7). Among the upregulated RBP genes, two were associated with TcUBP1 by being part of the same RNA-protein community described above: TcRBP9A (TcCLB.511127.10) and TcRBP26A (TcCLB.506795.10). These two proteins may be controlled by TcUBP1 and could provide positive feedback by coregulating, together with TcUBP1, mRNA targets related to the trypomastigote-specific form. TcUBP1 is expressed in all the life cycle stages of T. cruzi and is involved in the formation of distinct regulatory complexes. TcUBP1 has been previously reported as an interacting partner of the cytoplasmic DRBD2-mRNP complex in epimastigotes, together with UBP2, DRBD3, and PABP2, among others (67). This mRNP complex has a different RBP composition, and possibly a different function, than the previous RNA-protein network mentioned above.
By overexpressing TbRBP6 in noninfectious procyclic trypanosomes, Kolev and co-workers recapitulated in vitro the generation of infective metacyclic forms observed in the tsetse fly (21). Similarly, forced expression of TbRBP10 in procyclic forms induces differentiation to bloodstream forms (46). Since the T. cruzi homologues of these two regulators are among the RBP upregulated genes listed in File S7, TcRBP6A (TcCLB.506693.30) and TcRBP10 B (TcCLB.510507.50), it is tempting to speculate that TcUBP1 is upstream in the regulatory cascade that triggers parasite differentiation.
In summary, the transcriptome data presented here obtained by overexpressing TcUBP1-GFP in the noninfectious epimastigote T. cruzi stage provide a comprehensive picture of the mRNA steady-state level of the differentiation process toward the infective stage. Our results deepen the knowledge of previous reports of our laboratory and show that the levels of TcUBP1 trigger a posttranscriptional regulatory program that occurs during parasite differentiation, to transform replicative epimastigotes into infective quiescent metacyclic trypomastigotes. Transcriptome of T. cruzi UBP1-overexpressing parasites

Experimental procedures
Plasmid construction, parasite cultures, and transfection The DNA construct pTcINDEX-TcUBP1-GFP previously used in Sabalette et al. (28) was used for parasite transfections. Protein expression values in Tet+ induced epimastigote samples after 96 h were determined relative to noninduced controls (Tet-) by Western blot analysis of GFP levels normalized to total protein loading, as measured by Coomassie Blue staining. T. cruzi epimastigotes, from the CL Brener strain, were cultured in BHT medium containing 10% heatinactivated fetal calf serum (BHT 10%) at 28 C. All parasite cultures were performed in plastic flasks without shaking, unless otherwise stated. Parasites were transfected by electroporation subsequently with pLew vector and pTcINDEX constructions and selected with 500 μg/ml of G418 and 250 μg/ml Hygromycin. For induction of recombinant proteins from the pTcINDEX vector, parasites were incubated in BHT 10% containing 0.5 μg/ml tetracycline for 96 h at 28 C with shaking.

RNA preparation and RNA-Seq
Total RNA was prepared from approximately 10 7 epimastigote uninduced cells and 4-day Tet+ induced cells that express TcUBP1. RNA from three biological replicates was prepared using the TRIzol reagent from Invitrogen according to the manufacturer's instructions. The quality of RNA samples was checked on 1% agarose gel and quantified using NanoDrop 2000 spectrophotometer (Thermo Scientific). Additional quality assessment for the integrity of RNA samples, isolation of poly (A)+ mRNA, library preparation, and sequencing on DNBSeq platform were performed at the BGI Americas Corporation.

Overall quality parameters of the RNA-Seq data
The RNA-Seq bioanalyzer library profile of both samples was generated on Agilent 2100 instrument. The samples were next used for paired-end (PE) deep sequencing and the libraries were sequenced using 2 × 100 PE chemistry on DNBSeq platform for generating 5.3 GB of data per sample. After trimming of low-quality sequences, in total, 24M reads (11 GB) were obtained for each UBP1-OE and WT samples. To minimize genetic heterogeneity we choose the reference genome CL Brener Esmeraldo-like strain (TriTrypDB-59_TcruziCLBrenerEsmeraldo-like_Genome.fasta), which has a genome size of 32.53 Mbp. The obtained mapped read numbers for UBP1-OE and WT samples were 42,880,869 and 42,539,799, respectively.

Read processing and data analysis
Read processing and data analysis were performed. The short reads less than 50 bases were dropped to exterminate the sequencing artifacts, and the quality of reads was evaluated using FASTQC toolkit (score >35) (68). The high-quality reads were de novo assembled using bowtie2 with parameter -very sensitive-local. Samtools were used to index the output, and the quantitative assessment of reads was performed with featureCounts with parameters '-p -t "CDS" -g "ID" -T 40' (69). PCA was performed to ensure the quality of data (Fig. S3). Differential gene analysis was conducted using DESeq2 (32). The obtained count value was used to identify the differentially expressed gene transcripts using the criteria of at least 2-fold change (|log2 fold change| >1) in the sequence count between OE and WT samples and the Benjamini-Hochberg FDR adjusted p value < 0.05. The Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values for each transcript were log-transformed and normalized, which was subsequently used to calculate the matrix distance with Euclidean distance and complete-linkage methods. The R statistics package pheatmap was used to construct the heatmap (https://cran.r-project.org/web/packages/pheatmap. html). The differentially expressed genes were used for GO terms/KEGG pathway enrichment analyses using hypergeometric test equivalent to one-tailed Fisher's exact test with a FDR value of 0.05 using TriTrypDB. Volcano, GO enrichment, and violin plots were constructed using R with the package ggplot2 (69,70).

Functional annotation of gene lists
GO analysis was carried out for the differentially expressed genes from the TriTrypDB database (http://www. tritrypdb.org). The GO sequence distribution was analyzed for all the three GO domains: biological processes, molecular function, and cellular component. All the genes for T. cruzi were taken as reference set and the differentially expressed genes for both lists were taken as test set (up-or downregulated after UBP1-OE). The GO annotations were extracted and visualized as bubble charts using ggplot2 in R (69,70). Also, to categorize gene lists into overrepresented functionally related groups, DAVID (Database for Annotation, Visualization and Integrated Discovery, version 6.8) functional annotation clustering tool was used (71). Groups with an "enrichment score" (ES) > 1.5 (defined as the minus logarithm of the geometric median of p values) were considered significant (72).

Data availability
RNA-Seq raw data files used in this study are available as FASTQ files of 100-bp paired-end reads in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database with the following study number: PRJNA907231.